Discovery of Ambiguous and Unambiguous Discourse Connectives via Annotation Projection

نویسنده

  • Yannick Versley
چکیده

We present work on tagging German discourse connectives using English training data and a German-English parallel corpus, and report first results towards a more comprehensive approach of doing annotation projection for explicit discourse relations. Our results show that (i) an approach based on a dictionary of connectives currently has advantages over a simpler approach that uses word alignments without further linguistic information, but also that (ii) bootstrapping a connective dictionary using distribution-based heuristics on aligned bitexts seems to be a feasible and low-effort way of creating such a resource. Our best method achieves an F-measure of 68.7% for the identification of discourse connectives without any German-language training data, which is a large improvement over a nontrivial baseline.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine Translation with Many Manually Labeled Discourse Connectives

The paper presents machine translation experiments from English to Czech with a large amount of manually annotated discourse connectives. The gold-standard discourse relation annotation leads to better translation performance in ranges of 4–60% for some ambiguous English connectives and helps to find correct syntactical constructs in Czech for less ambiguous connectives. Automatic scoring confi...

متن کامل

Multilabel Tagging of Discourse Relations in Ambiguous Temporal Connectives

Many annotation schemes for discourse relations allow combinations such as temporal+cause (for events that are temporally and causally related to each other) and temporal+contrast (for contrasts between subsequent time spans, or between events that are temporally coextensive). However, current approaches for the automatic classification of discourse relations are limited to producing only one r...

متن کامل

Annotation of Discourse Connectives for the Prague Dependency Treebank

The paper presents a preliminary study on discourse connectives (DC) in Czech. Aiming to build a computerized language corpus capturing discourse relations in Czech, we base our observations on current foreign projects with the same purpose. In this study, first, the different methods of linguistic analysis of the discourse structure and discourse connectives are described, next, the nature and...

متن کامل

Annotation And Data Mining Of The Penn Discourse TreeBank

The Penn Discourse TreeBank (PDTB) is a new resource built on top of the Penn Wall Street Journal corpus, in which discourse connectives are annotated along with their arguments. Its use of standoff annotation allows integration with a stand-off version of the Penn TreeBank (syntactic structure) and PropBank (verbs and their arguments), which adds value for both linguistic discovery and discour...

متن کامل

Annotating Discourse Connectives In The Chinese Treebank

In this paper we examine the issues that arise from the annotation of the discourse connectives for the Chinese Discourse Treebank Project. This project is based on the same principles as the PDTB, a project that annotates the English discourse connectives in the Penn Treebank. The paper begins by outlining range of discourse connectives under consideration in this project and examines the dist...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010